Overview

Dataset statistics

Number of variables12
Number of observations10738
Missing cells251
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1006.8 KiB
Average record size in memory96.0 B

Variable types

NUM8
CAT3
BOOL1

Reproduction

Analysis started2020-12-07 11:42:36.361190
Analysis finished2020-12-07 11:43:19.279868
Duration42.92 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

customer_stay_score is highly correlated with customer_ctr_scoreHigh correlation
customer_ctr_score is highly correlated with customer_stay_scoreHigh correlation
customer_id has unique values Unique
customer_visit_score has unique values Unique
customer_ctr_score has unique values Unique
customer_frequency_score has unique values Unique
customer_affinity_score has unique values Unique

Variables

customer_id
Categorical

UNIQUE

Distinct count10738
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size83.9 KiB
csid_6010
 
1
csid_2717
 
1
csid_3960
 
1
csid_3746
 
1
csid_8065
 
1
Other values (10733)
10733
ValueCountFrequency (%) 
csid_60101< 0.1%
 
csid_27171< 0.1%
 
csid_39601< 0.1%
 
csid_37461< 0.1%
 
csid_80651< 0.1%
 
csid_92981< 0.1%
 
csid_14601< 0.1%
 
csid_6231< 0.1%
 
csid_99451< 0.1%
 
csid_23521< 0.1%
 
Other values (10728)1072899.9%
 

Length

Max length10
Median length9
Mean length8.965729186
Min length6

customer_visit_score
Real number (ℝ≥0)

UNIQUE

Distinct count10738
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.060941294733624
Minimum0.5689647666895101
Maximum47.30669098267679
Zeros0
Zeros (%)0.0%
Memory size83.9 KiB

Quantile statistics

Minimum0.5689647667
5-th percentile7.442481958
Q113.51802134
median18.77410921
Q324.50171939
95-th percentile31.42445376
Maximum47.30669098
Range46.73772622
Interquartile range (IQR)10.98369805

Descriptive statistics

Standard deviation7.419609076
Coefficient of variation (CV)0.389257223
Kurtosis-0.4065214262
Mean19.06094129
Median Absolute Deviation (MAD)5.439947497
Skewness0.1014477924
Sum204676.3876
Variance55.05059884
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
8.2061251431< 0.1%
 
12.301077671< 0.1%
 
9.8186702351< 0.1%
 
20.231405841< 0.1%
 
23.189564591< 0.1%
 
18.975096441< 0.1%
 
22.671055671< 0.1%
 
29.173745671< 0.1%
 
23.569939921< 0.1%
 
19.337037711< 0.1%
 
Other values (10728)1072899.9%
 
ValueCountFrequency (%) 
0.56896476671< 0.1%
 
0.64418068551< 0.1%
 
0.66505347171< 0.1%
 
0.7152155171< 0.1%
 
0.91862684391< 0.1%
 
ValueCountFrequency (%) 
47.306690981< 0.1%
 
43.926748331< 0.1%
 
43.757269821< 0.1%
 
42.342567411< 0.1%
 
42.194958251< 0.1%
 
Distinct count10696
Unique (%)100.0%
Missing42
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean5.27484715286252
Minimum-0.16193998183198755
Maximum16.63824329516359
Zeros0
Zeros (%)0.0%
Memory size83.9 KiB

Quantile statistics

Minimum-0.1619399818
5-th percentile2.262566301
Q13.971586843
median5.218479286
Q36.520363539
95-th percentile8.386104872
Maximum16.6382433
Range16.80018328
Interquartile range (IQR)2.548776696

Descriptive statistics

Standard deviation1.882558586
Coefficient of variation (CV)0.3568934855
Kurtosis0.545163275
Mean5.274847153
Median Absolute Deviation (MAD)1.276309404
Skewness0.2892716474
Sum56419.76515
Variance3.54402683
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5.8007399671< 0.1%
 
6.5805340981< 0.1%
 
7.5567788511< 0.1%
 
4.63023681< 0.1%
 
3.7967709831< 0.1%
 
2.6414331< 0.1%
 
4.6830220251< 0.1%
 
5.477228771< 0.1%
 
7.8853874291< 0.1%
 
4.6982430061< 0.1%
 
Other values (10686)1068699.5%
 
(Missing)420.4%
 
ValueCountFrequency (%) 
-0.16193998181< 0.1%
 
-0.048757130641< 0.1%
 
0.06443449741< 0.1%
 
0.087830446421< 0.1%
 
0.17588180481< 0.1%
 
ValueCountFrequency (%) 
16.63824331< 0.1%
 
16.630886641< 0.1%
 
15.519324081< 0.1%
 
14.653194951< 0.1%
 
14.649868371< 0.1%
 

customer_ctr_score
Real number (ℝ)

HIGH CORRELATION
UNIQUE

Distinct count10738
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.17591198060116714
Minimum-0.5479890837946332
Maximum2.6794742421447224
Zeros0
Zeros (%)0.0%
Memory size83.9 KiB

Quantile statistics

Minimum-0.5479890838
5-th percentile-0.07250584954
Q10.01084001214
median0.07407813627
Q30.1596064355
95-th percentile1.072821684
Maximum2.679474242
Range3.227463326
Interquartile range (IQR)0.1487664233

Descriptive statistics

Standard deviation0.3728289383
Coefficient of variation (CV)2.119406177
Kurtosis10.96033251
Mean0.1759119806
Median Absolute Deviation (MAD)0.07104629897
Skewness3.216021049
Sum1888.942848
Variance0.1390014172
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.10850984661< 0.1%
 
0.015840657751< 0.1%
 
0.064935096661< 0.1%
 
-0.11974841931< 0.1%
 
0.15978852551< 0.1%
 
0.15774910431< 0.1%
 
0.10697639261< 0.1%
 
0.061933937411< 0.1%
 
-0.027820281041< 0.1%
 
0.34953719621< 0.1%
 
Other values (10728)1072899.9%
 
ValueCountFrequency (%) 
-0.54798908381< 0.1%
 
-0.54622746311< 0.1%
 
-0.53846832881< 0.1%
 
-0.53394142371< 0.1%
 
-0.53248570931< 0.1%
 
ValueCountFrequency (%) 
2.6794742421< 0.1%
 
2.5712384131< 0.1%
 
2.570438641< 0.1%
 
2.5104065471< 0.1%
 
2.3909430971< 0.1%
 

customer_stay_score
Real number (ℝ)

HIGH CORRELATION

Distinct count10701
Unique (%)100.0%
Missing37
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean0.37423006184720775
Minimum-0.4624940639254821
Maximum14.701914171298233
Zeros0
Zeros (%)0.0%
Memory size83.9 KiB

Quantile statistics

Minimum-0.4624940639
5-th percentile-0.1002052956
Q1-0.02766573337
median0.03720079496
Q30.1790287653
95-th percentile2.441966972
Maximum14.70191417
Range15.16440824
Interquartile range (IQR)0.2066944986

Descriptive statistics

Standard deviation1.222030798
Coefficient of variation (CV)3.265453321
Kurtosis29.79532391
Mean0.3742300618
Median Absolute Deviation (MAD)0.08274564002
Skewness5.008726307
Sum4004.635892
Variance1.493359272
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-0.10121994541< 0.1%
 
0.17514772541< 0.1%
 
0.049952031181< 0.1%
 
-0.045552116841< 0.1%
 
0.08133632661< 0.1%
 
-0.020484559371< 0.1%
 
0.15001620781< 0.1%
 
-0.069634379641< 0.1%
 
-0.0042541995391< 0.1%
 
0.15941644561< 0.1%
 
Other values (10691)1069199.6%
 
(Missing)370.3%
 
ValueCountFrequency (%) 
-0.46249406391< 0.1%
 
-0.38951742781< 0.1%
 
-0.37248083551< 0.1%
 
-0.35325602541< 0.1%
 
-0.34927700741< 0.1%
 
ValueCountFrequency (%) 
14.701914171< 0.1%
 
14.281132871< 0.1%
 
13.539720371< 0.1%
 
12.408497331< 0.1%
 
11.993541741< 0.1%
 

customer_frequency_score
Real number (ℝ≥0)

UNIQUE

Distinct count10738
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.376894687891762
Minimum0.028575210510020193
Maximum52.39501392251049
Zeros0
Zeros (%)0.0%
Memory size83.9 KiB

Quantile statistics

Minimum0.02857521051
5-th percentile0.1542461741
Q10.3136096545
median0.5168299359
Q31.125379515
95-th percentile14.11275303
Maximum52.39501392
Range52.36643871
Interquartile range (IQR)0.8117698602

Descriptive statistics

Standard deviation5.601910934
Coefficient of variation (CV)2.356819157
Kurtosis19.19443114
Mean2.376894688
Median Absolute Deviation (MAD)0.2615149293
Skewness4.083012882
Sum25523.09516
Variance31.38140612
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.79034607941< 0.1%
 
1.418089161< 0.1%
 
0.6911659021< 0.1%
 
0.7292715261< 0.1%
 
10.029618751< 0.1%
 
0.82826124631< 0.1%
 
3.9014576061< 0.1%
 
0.1115881421< 0.1%
 
0.37529636291< 0.1%
 
0.44756623191< 0.1%
 
Other values (10728)1072899.9%
 
ValueCountFrequency (%) 
0.028575210511< 0.1%
 
0.033320081341< 0.1%
 
0.03559023141< 0.1%
 
0.035911515071< 0.1%
 
0.036605369411< 0.1%
 
ValueCountFrequency (%) 
52.395013921< 0.1%
 
49.679380011< 0.1%
 
49.034194641< 0.1%
 
47.816850081< 0.1%
 
46.921309031< 0.1%
 
Distinct count10692
Unique (%)100.0%
Missing46
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean5.788179528639336
Minimum2.7528361476216268
Maximum18.743835720199684
Zeros0
Zeros (%)0.0%
Memory size83.9 KiB

Quantile statistics

Minimum2.752836148
5-th percentile3.541244833
Q14.193234472
median4.842574595
Q36.286400327
95-th percentile11.66568404
Maximum18.74383572
Range15.99099957
Interquartile range (IQR)2.093165855

Descriptive statistics

Standard deviation2.531309458
Coefficient of variation (CV)0.4373239367
Kurtosis3.191873393
Mean5.788179529
Median Absolute Deviation (MAD)0.8261130491
Skewness1.851646948
Sum61887.21552
Variance6.40752757
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
12.228092161< 0.1%
 
7.3978333991< 0.1%
 
9.471639361< 0.1%
 
4.0582116931< 0.1%
 
10.225226051< 0.1%
 
5.4420101931< 0.1%
 
6.4520788111< 0.1%
 
5.6470953541< 0.1%
 
3.6821753251< 0.1%
 
7.8205233411< 0.1%
 
Other values (10682)1068299.5%
 
(Missing)460.4%
 
ValueCountFrequency (%) 
2.7528361481< 0.1%
 
2.7878798791< 0.1%
 
2.8122955981< 0.1%
 
2.8217700781< 0.1%
 
2.8229258881< 0.1%
 
ValueCountFrequency (%) 
18.743835721< 0.1%
 
18.487907531< 0.1%
 
18.429711691< 0.1%
 
18.34186881< 0.1%
 
18.266781141< 0.1%
 

customer_order_score
Real number (ℝ≥0)

Distinct count10672
Unique (%)100.0%
Missing66
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean3.150070538556626
Minimum0.36333795012621
Maximum9.09020550869893
Zeros0
Zeros (%)0.0%
Memory size83.9 KiB

Quantile statistics

Minimum0.3633379501
5-th percentile1.532564209
Q12.454017385
median3.118394172
Q33.756566397
95-th percentile4.892306349
Maximum9.090205509
Range8.726867559
Interquartile range (IQR)1.302549012

Descriptive statistics

Standard deviation1.03541551
Coefficient of variation (CV)0.3286959759
Kurtosis1.210741347
Mean3.150070539
Median Absolute Deviation (MAD)0.6528132847
Skewness0.5768648974
Sum33617.55279
Variance1.072085278
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.617689051< 0.1%
 
4.2117954531< 0.1%
 
3.7712724831< 0.1%
 
3.1857242871< 0.1%
 
4.3505774611< 0.1%
 
2.64369821< 0.1%
 
2.5270286631< 0.1%
 
3.0473557451< 0.1%
 
2.2311389131< 0.1%
 
3.3287884881< 0.1%
 
Other values (10662)1066299.3%
 
(Missing)660.6%
 
ValueCountFrequency (%) 
0.36333795011< 0.1%
 
0.53713675271< 0.1%
 
0.56107239091< 0.1%
 
0.56927971771< 0.1%
 
0.59975461641< 0.1%
 
ValueCountFrequency (%) 
9.0902055091< 0.1%
 
8.9519387481< 0.1%
 
8.9376198611< 0.1%
 
8.3573905231< 0.1%
 
8.2262494691< 0.1%
 

customer_affinity_score
Real number (ℝ)

UNIQUE

Distinct count10738
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.06183578563719
Minimum-0.4868340562827102
Maximum248.55275470161067
Zeros0
Zeros (%)0.0%
Memory size83.9 KiB

Quantile statistics

Minimum-0.4868340563
5-th percentile-0.08343869615
Q14.530085389
median12.65335707
Q323.11457668
95-th percentile50.46467763
Maximum248.5527547
Range249.0395888
Interquartile range (IQR)18.58449129

Descriptive statistics

Standard deviation18.76269336
Coefficient of variation (CV)1.099687841
Kurtosis16.85754965
Mean17.06183579
Median Absolute Deviation (MAD)8.987608416
Skewness2.993483837
Sum183209.9927
Variance352.0386622
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
18.519008111< 0.1%
 
25.132722481< 0.1%
 
29.400320011< 0.1%
 
7.7666934731< 0.1%
 
14.960312051< 0.1%
 
15.744343631< 0.1%
 
23.278789621< 0.1%
 
12.634166931< 0.1%
 
12.297025081< 0.1%
 
20.19377331< 0.1%
 
Other values (10728)1072899.9%
 
ValueCountFrequency (%) 
-0.48683405631< 0.1%
 
-0.4828940961< 0.1%
 
-0.4733285271< 0.1%
 
-0.45562533661< 0.1%
 
-0.45429758231< 0.1%
 
ValueCountFrequency (%) 
248.55275471< 0.1%
 
246.93696551< 0.1%
 
218.45877021< 0.1%
 
206.66972831< 0.1%
 
198.9232641< 0.1%
 
Distinct count5
Unique (%)< 0.1%
Missing23
Missing (%)0.2%
Memory size83.9 KiB
C
4919
B
4430
D
 
536
AA
 
418
A
 
412
ValueCountFrequency (%) 
C491945.8%
 
B443041.3%
 
D5365.0%
 
AA4183.9%
 
A4123.8%
 
(Missing)230.2%
 

Length

Max length3
Median length1
Mean length1.043211026
Min length1

X1
Categorical

Distinct count5
Unique (%)< 0.1%
Missing37
Missing (%)0.3%
Memory size83.9 KiB
BA
4511
A
2268
F
2235
AA
1611
E
 
76
ValueCountFrequency (%) 
BA451142.0%
 
A226821.1%
 
F223520.8%
 
AA161115.0%
 
E760.7%
 
(Missing)370.3%
 

Length

Max length3
Median length2
Mean length1.577016204
Min length1
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size83.9 KiB
0
9443
1
 
1295
ValueCountFrequency (%) 
0944387.9%
 
1129512.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

customer_idcustomer_visit_scorecustomer_product_search_scorecustomer_ctr_scorecustomer_stay_scorecustomer_frequency_scorecustomer_product_variation_scorecustomer_order_scorecustomer_affinity_scorecustomer_active_segmentX1customer_category
0csid_113.1684259.447662-0.070203-0.1395410.4369564.7057612.5379857.959503CF0
1csid_217.0929797.3290560.153298-0.1027260.3803404.2051384.19344417.517381CA0
2csid_317.5053345.1436760.1067090.2628340.4176484.4790703.87897112.595155CBA0
3csid_431.4233814.917740-0.020226-0.1005260.7781305.0555352.7089404.795073AAF0
4csid_511.9095024.2370730.1871780.1728910.1620673.4452473.67736056.636326CAA0
5csid_69.0079227.0515680.1615640.0409970.1919354.2098403.18196118.862680CBA0
6csid_713.7071095.6251790.009634-0.0199980.1776224.1650934.689834109.203352BE0
7csid_832.0421223.563568-0.050730NaN0.2570604.3667614.04126024.036321AAA0
8csid_920.4341815.1116820.1339220.0368930.4423144.7595163.40742417.078123CBA0
9csid_1013.7782143.8292990.1591020.1658180.5581876.2559803.3154629.443864BBA0

Last rows

customer_idcustomer_visit_scorecustomer_product_search_scorecustomer_ctr_scorecustomer_stay_scorecustomer_frequency_scorecustomer_product_variation_scorecustomer_order_scorecustomer_affinity_scorecustomer_active_segmentX1customer_category
10728csid_1072924.7723174.7532380.019578-0.0700970.5562644.5900203.12614512.193862CA0
10729csid_1073011.6574556.2333530.007517-0.0161220.4767004.0246552.72774022.214286BA0
10730csid_1073118.7938874.1998260.1437470.2195250.2673843.8677312.89310628.685574BBA0
10731csid_1073229.0941676.391500-0.051283-0.0797430.4348654.7919492.2445126.251333BBA0
10732csid_1073314.6640365.3418110.043920-0.1250900.2690194.5630343.68517614.066261CA0
10733csid_1073423.6726156.7015140.092879-0.0173321.2103977.0036633.0270841.952911CBA0
10734csid_1073525.6730286.4977960.050216-0.0472110.7252305.4075073.1041725.124286CBA0
10735csid_1073631.6768447.7998800.062961-0.0327650.3181185.5984862.40305121.864188ABA0
10736csid_1073728.4417805.588302-0.0939310.0815860.1321773.6164924.97224386.969977BAA0
10737csid_1073820.6630354.4783010.2531650.3813490.5049044.1810924.46921527.770899BA0